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ABSTRACT 


WiFi access points are widely spread everywhere in all our daily life routines. Using 
these devices to provide services other than the Internet is becoming familiar nowa- 
days. This paper conducts an experimental study to estimate the number of people in 
an indoor environment through two system setups, line of sight, and non-line of sight. 
Relationship modeling between WiFi received signal and the number of people uses 
polynomial regression. The experiment comprised of two stages: first is the data col- 
lection from a controlled number of people. Then, the collected data used to train the 
system through polynomial regression. The second is testing the system’s effective- 
ness by applying it to an uncontrolled environment. Testing results revealed efficiency 
in using WiFi received signal strength to do the people counting (up to 60) because of 
the accuracy achievements of 93.17% in the line of sight system. The non-line of sight 
system disclosed randomness in the received signal strength indicator regardless of the 
change in the number of people. The randomness is mainly caused by the fading effect 
of the concrete wall. Therefore it is inefficient to use the non-line of sight system in 
concrete buildings. 
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1. INTRODUCTION 


In the past two decades, the use of the Internet has grown dramatically. The Internet has become one 
of the necessities of modern life. In the current time, it is hard to find a building or floor in that building does 
not have WiFi [1]. It has become natural to take advantage of these devices (WiFi) and use them to provide 
services other than providing the Internet, such as estimating the number of people in that area. In different 
situations, it is necessary to know the number of people in a specific area to control the number or to improve 
the services [2]. The traditional way to count the number of people in many cases is complicated; and requires 
additional hardware, which makes it expensive especially if the number of people is related to doing something 
automatically, or in real-time applications. Researchers were implemented the people counting in different 
methods. Some of them used cameras for the counting process as in [3]. While the authors in [4] did that 
depending on sensors. Further, the researchers in [5] used devices carried by people such as smartphones or 
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RFID to estimate the number of people. 

Regularly, the WiFi signal is subjected to many variations through its way to the receiver depending 
on multiple factors, starting with the distance between transmitter (router) and receiver, furniture in the way 
between, and other obstacles, including humans. In our case, estimating the number of people is intended 
toward closed rooms such as conference or lecture rooms inside universities. Because the furniture (in such 
indoor places) is constant, the substantial factor that affects the WiFi signal, in this case, is the people inside 
the room. 

In this work, we proposed two system setups to inspect in the estimation of the people count. The 
system setups are line of sight (LOS) and non-line of sight (non-LOS). Both systems are examining the WiFi 
received signal strength indicator (RSSI) from the router installed in the room. The utilization of the existing 
WiFi access point is an efficient way to serve our purpose and requires no additional cost. In the LOS system, 
the transmitting and the receiving stations were placed in the same room, while in the non-LOS system, we 
removed the receiver station to a neighboring room. In this case, there is no direct line of sight between the 
transmitter and the receiver stations. The non-LOS system is considered in this paper to investigate the effect 
of eliminating the direct line-of-sight path between the transmitter and the receiver on the people counting 
process. In other words, in the non-LOS environment, the received signal will be only the superposition of 
various copies of the WiFi signal from diverse paths (multipath fading) without any direct line-of-sight. Thus, 
the received signal will be strongly affected by obstacles between the transmitter and the receiver, including 
the people. The two methods, LOS and non-LOS, were proposed to investigate which one is more accurate 
than the other in the estimation process. An empirical study is applied to do the estimation. The study used 
the already installed WiFi access point as the transmitter. The NodeMCU development board, which contains 
ESP8266 WiFi SOC is used as a receiver to conduct the RSSI examination and data collection [6]. 

The relationship between RSSI and obstacles is not perfectly linear. Therefore, polynomial regression 
is one of the best ways to model this kind of data distribution. Thus, polynomial regression is going to be used 
to model the relationship between the independent variable (in our case, the RSSI) and the dependant variable 
(the number of people) as the n th degree polynomial. We are going to reveal which polynomial degree that 
best fits the RSSI data distribution in which underfitting, as well as overfitting, will be avoided. 

Many procedures are used to estimate the number of people in a specific area. Some of those proce- 
dures are used extra hardware in the estimation process, such as cameras, sensors, and smartphones [7]-[9]. 
Additional hardware adds additional cost to the estimation process. There exist another procedures that depends 
on using WiFi to estimate the number of people as our case in this paper [10]-[20], all of them lack accuracy, 
and used a small number of people in testing. The authors in [21] proposed a framework to estimate the number 
of people in a specific area depending on the concept that the transmitted signal carries the signature of people, 
in other words, blocking the Line of Sight and Propagation effects. After putting these two concepts together, 
they developed a mathematical expression to describe the relationship between the amplitude of the received 
signal and the total number of occupants. The paper used a small number of people to test (maximum of 9 
people) in the meanwhile, the error rate is increasing as the number of people increasing. The authors in [22] 
were able to estimate the number of people by taking the received signal and analyze its Doppler spectrum. 
They realized that any increase in the crowd density increases the Doppler spectrum peaks; again, the testing 
on this approach took place on a small number of people (maximum of 7). Also, the crowd density accuracy is 
not high enough. Here another technique is used [23], where the researchers depend on the inter-event times to 
count the total number of people. Their framework shows how the inter-event times have significant informa- 
tion that enabled them to estimate the number of people in a specific area. The authors tested their framework 
with more than 40 experiments in different areas, and they got results with high accuracy, but again they used a 
small number of people (a maximum of 9). The researchers in [24] used the kNN classifier. To do the training, 
they used one transmitting and two receiving nodes, and they collected RF data to form a database that consists 
of the features. In the end, the authors used the Bootstrap re-sampling technique to optimize the time required 
for training and to extract an accurate feature; this work achieved 94% accuracy with a maximum of 18 people. 
The authors in [2] used linear regression to estimate the people count; they achieved 77.2% accuracy in only 
seven people. To summarize our work: 

— The proposed method does not require people to carry any devices (no phone, sensors, or transmitters), 
which makes it affordable and easier to implement. 

— The proposed method is tested for a large number of people (up to 60), while other works used only 7,9, 
and 18 people. 
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— The proposed method does not require prior analysis, such as feature extractions before the data is fed 
into the main regression model. 
— The proposed method achieved high accuracy in comparison to similar research works. 


The rest of the paper is structured as follows. Section 2 explains the research method, including the 
system setup, data collection, and estimation modeling. Section 3 presents the results obtained from the training 
and the testing of our system. Finally, section 4 states the conclusion. 


2. RESEARCH METHOD 
2.1. Overview 


Usually, RSSI is affected by objects, moving and stationary, between the transmitter and the receiver. 
These objects cause reflection, diffraction, and scattering. Also, the multipath and the distance between the 
transmitter and the receiver affects signal strength significantly. Whenever the distance increases, the signal 
attenuation increases, which leads to signal degradation. The same applies to the number of people because 
they have effects, like other objects, on signal degradation; in other words, the more people in the room, the 
more impact on signal degradation. For example, the expected RSSI value when there are 5 people in a room 
is larger than the RSSI value when there are 20 people in the same room. 


During an event in the intended room, the cumulative distribution function (CDF) of the RSSI was 
measured in two environments, LOS and non-LOS. The reason behind those measurements is to study the 
feasibility of people counting depending on WiFi RSSI. The CDFs were measured by collecting multiple RSSI 
measures from the intended router via the proposed receiver system. The obtained results revealed a strong 
relationship between the number of people and the RSSI value. The results of this test are explained in detail 
in the results section. 


The experiment will take place in two stages: the first one is the data collection from a controlled 
environment (controlling the number of people inside the room). Then, the previously collected data used 
to train the system through polynomial regression. The second stage is testing the system’s effectiveness by 
applying it in an uncontrolled environment. 


2.2. Environment and system setup 


Regarding the system test environment, the project was applied in the conference room at the Tech- 
nical College, Mosul-Iraq. Table 1 shows the simulation environment parameters. In this work, the distance 
variations are not the case because there is a fixed distance of 15m between the WiFi access point and the 
receiver station. 


Table 1. Simulation environment parameters 


Room area 15m x 8m 

Capacity 60 people 

Construction type Concrete 

Transmitter TP-Link Router (TL-WR841N) 
Receiver NodeMCU microcontroller 
Furniture Whiteboard, projector, Wooden Table, 


an iron door, 60 iron seats, and 6 split type AC devices 


Two system setups were considered LOS and non-LOS (shown in Figure 1). The already installed 
WiFi access point was used as the transmitter, while the NodeMCU is used as the receiver to collect the RSSI 
data. We have chosen the NodeMCU because it is an open-source, interactive, programmable, low cost, low 
power, and WiFi-enabled that is easily used to prototype IoT products. The NodeMCU collects the RSSI data 
from the already installed WiFi access point in the intended room. Then, the NodeMCU makes the people 
number estimation based on the collected RSSI information. To evaluate our system effectiveness, we set up 
our system in the conference room at the intended room during regular daily work hours. The receiver system 
is programmed to take the RSSI readings for 10 seconds and calculate the average, then apply the proposed 
estimation algorithms. 
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Figure 1. Layout of the conference room in the technical college, Mosul-Iraq; area: 15m x 8m; contains 60 
seats. (A: Receiver in LOS system setup; B: Receiver in non-LOS system setup) 











2.3. Data collection 

From previous results, the relationship between RSSI value and the number of people needs to be 
studied. Experimental results were conducted to examine the relationship between RSSI and the number of 
people. The study was started by placing the receiver and the transmitter at the same height in both systems. 
Then, the experiments took place on ten sets of people. The first set includes five people, the second set the 
number of people increased to ten, and so on. In each set, the number of people increases by five people until 
the tenth set, we had fifty. In other words, the number of people is increased gradually from 5 to 50, starting 
from the first set that includes five people only, ending with the tenth set that has fifty. 

In each set, we advised the people to sit in 10 random sitting positions. In this way, we are trying 
to mimic the randomness of people sitting as much as possible. In each sitting position, the RSSI value is 
measured ten times, the average of these ten readings 1s considered. To this point, there are ten average readings 
for each set of people, one average reading per sitting position. Eventually, one average RSSI value has been 
considered as an aggregate of those ten averages (an overall average). In other words, for each set of people, 
there are 100 RSSI values, everything regarding this matter is clarified in Figure 2. 


Set of people 














10 Sitting position =m- 1* Sitting position 
i | 
10* RSSI 1* RSSI 10* RSSI 1* RSSI 
Reading Reading Reading Reading 
Average RSSI reading Average RSSI reading 





Aggregate average 
RSSI reading 


Figure 2. Set of people with 10 random sitting positions; each position has an average RSSI, and then an 
aggregate average will correspond to the number of people in that set 


2.4. Estimation modeling 

RSSI data obtained empirically is going to be used in the estimation training using polynomial regres- 
sion. Polynomial regression was used because of the non-linear behavior between the independent variable 
(RSSI) and the dependent variable (number of people). First, second, and third-degree polynomials, (1) from 
[25], were applied in the estimation modeling process. 


y) = ag + aye tooa tasr +... +an2" te (1) 


where &0,1,2,3,...n are the polynomial coefficients, x is the RSSI (independent variable), ~ is the estimated 
number of people (dependent variable), n is the nth degree polynomial, and € is an unobserved random error 
with a zero mean conditioned on the independent variable zx. 
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To calculate the total error rate regarding our model, we have to compute it based on the difference 
between the estimated and the real number of people. In (2) is used to calculate the total number of errors. 


N Ra— En 
D | | 


TotalError Rate = ——_ (2) 


where Rn is the real number of people currently in the room, Ep is the estimated number of people, and N is 
the total number of investigated cases. 


3. RESULT AND DISCUSSION 

Figure 3a shows the CDF of the LOS environment. The figure shows that the RSSI values are ranging 
from a maximum RSSI value of -31 dBm to a minimum value of -72 dBm over time. These variations are due 
to reflection, diffraction, and scattering in addition to fading. The aforementioned attributes are emerging from 
moving and stationary objects between the transmitter and the receiver. The LOS CDF shows that approxi- 
mately 34% of the time the router operates in the RSSI region between -30 dBm and -60 dBm. While 66% of 
the time the router operates in less than -60 dBm. Also, only 4% of the time the router operates at the minimum 
RSSI value. 

Figure 3b shows the CDF of the non-LOS environment. The figure shows that the RSSI values are 
ranging from -33 dBm as the maximum RSSI to a minimum of -75 dBm over time. The effect of reflection, 
diffraction, scattering, and fading were increased in the non-LOS environment. This affection is due to the 
addition of more obstacles, a concrete wall in this case, between the transmitter and the receiver. As can be 
noticed from non-LOS CDF is that approximately 29% of the time the router operates in the RSSI region 
between -33 dBm and -60 dBm. While 71% of the time the router operates in less than -60 dBm. Also, only 
8% of the time the router operates at the minimum RSSI value. 

Figures 3 show that the people counting is possible as the RSSI value ranging between up and down 
during the change in the number of people. The minimum received RSSI values were 4% and 8% of the time for 
the LOS and non-LOS respectively. In our case, since the furniture and other obstacles are stationary between 
the transmitter and the receiver, the fluctuations in RSSI value mainly depend on the variations in the number 
of people. In other words, the number of people in the room is changing, and the room is not always full of 
people. 
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(a) (b) 


Figure 3. Cumulative distribution function (CDF) of RSSI. It can be observed that the RSSI is changing over 
time which means that the number of people has an impact on RSSI. The figure shows (a) LOS environment, 
(b) non-LOS environment 


Regrading the non-LOS system, the obtained results revealed a large margin of fluctuation and ran- 
domness. For example, the average RSSI obtained when there were 35 people in the room was -49 dBm, while 
when there were 20 people in the room, the average RSSI was -50 dBm (Figure 4). The same applies when 
there were 5, and 15 people in the room. It seemed that the concrete wall fading effect on the RSSI has a 
very random impact on the RSSI value and carries lots of noise. Behavior like this holds a large amount of 
randomness is unpredictable, makes it very hard to model, and yields a noticeable error rate. For this reason, 
we decided not to go far with the non-LOS system. 
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Figure 4. non-LOS system results, the concrete wall adds a noticeable amount of noise to the RSSI 


Figure 5 shows the obtained results from the LOS system that we are going to use in the system 
training process. Results showed a slight change in the RSSI in each set of people. This change is due to 
many reasons, for example, the people moving in their seats, and the movement of the fans that are circulating 
air in the room. Moreover, the results showed a noticeable change in the RSSI when the number of people is 
changing. An important matter that worths mentioning is the overlap among RSSI values when the number of 
people is changing. For example, the overlap in RSSI when there were 15, 20 people in the room, and when 
there were 30, 35 people in the room. The effect of this kind of overlap was reduced by taking the average of 
the RSSI values that belong to each set of people. 
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Figure 5. Obtained results, from a controlled number of people in LOS environment. A noticeable change in 
the RSSI is clear when the number of people is changing 


Figure 6 shows the system performance for a period of two regular consecutive workdays during 
events that took place in the conference room. The figure is organized to express the ground truth with the ex- 
pected values represented by different polynomial degrees. Also, it includes the difference between the ground 
truth and the estimated value represented by each polynomial degree. This difference explains the degree of 
accuracy provided by each polynomial degree. The more the difference is closer to zero, the more accurate 
and precise estimation acquired. The obtained estimation accuracy is as the following: the 1°’ degree poly- 
nomial revealed an accuracy of 90.5%, the 2”? degree polynomial presented an accuracy of 92.08%, and the 
374 order polynomial gave an accuracy of 93.17%. We had tested further polynomial degrees as the following: 
Ath 5th 6th, and 7°"-degree. The acquired accuracies were 93.27%, 92.59%, 91.1%, and 90% respectively. 
The obtained accuracy from the 4°”-degree polynomial revealed a very slight enhancement if compared to 
the 3°¢-degree polynomial, while the accuracy began to decrease whenever the polynomial degree increases. 
Therefore, we stopped at the 3’¢-degree polynomial as our final way of people estimation because it has the 
higher accuracy, and to avoid overfitting. 

Table 2 shows a comparison between our model and similar research works. We compared the accu- 
racy of our proposed method against the accuracy of other methods presented within the last five years. The 
accuracy of some of the methods in the table was originally presented accuracy as an error in the number of 
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people. For easier comparison, we normalized the accuracies so that they are all in terms of percentages. The 
accuracy of our proposed method is higher than four out of five approaches that we compared against. The 
kNN classifier method has higher accuracy than our proposed method but with a less number of people. It used 
only 18 people, while we utilized 60 people to calculate the accuracy. In the case of 21 people, we achieved an 
accuracy of ~ 99% (Figure 6). 
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Figure 6. System evaluation; the estimated number of people vs the real number; the difference belongs to 
polynomial 3 is the nearest to zero in most of the cases which means that it has the best accuracy 


Table 2. Our proposed approach vs. Similar research works 


Ref Accuracy No. of People Way of Modeling 
Our proposed method 93.27% 60 Polynomial regression 
[2] 77.2% 7 Linear regression 
[21] 77.71% 9 Mathematical expression with Kullback-Leibler divergence 
[22] 86% 7 Doppler spectrum analysis 
[23] 77.71% 9 PME of the inter-event times 
[24] 94% 18 KNN classifier 


To ensure that our system can be applied to different environments, we ran a system test in a different 
scenario. We moved our proposed system to a lecture hall in another building in the college. The tests ran in 
three classes: the first one has 15 people, the second one has 21 people, and the third one contains 31 people. 
Each test ran for two hours. The system is programmed to provide an estimation every 15 minutes. Test results 
(Figure 7) revealed the robustness of our proposed system as results showed an accuracy of +1 person in the 
three cases. 
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Figure 7. Ground truth vs estimation; three tests, the first one includes 15 people, the second one includes 21 
people, and the last one includes 31 people (+1 person is the error rate) 
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4. CONCLUSION 


In this work, two people counting strategies are studied, LOS and non-LOS systems. The non-LOS 
system seemed to fluctuate, a little unpredictable, and showed a high margin of error. On the contrary, the LOS 
system results showed robustness, a predictable counting strategy, and an acceptable margin of error. Because 
of the nonlinear relationship between RSSI and the number of people, polynomial regression revealed a high 
accuracy in system modeling. We achieved our main goals in this paper. The first goal is the low-cost people 
counting strategy that suits real-time applications. The second goal is to use a large number of people in the 
training process (50) and the estimation process (60). The LOS system showed a high degree of accuracy 
in the estimation of the number of people in an indoor environment, which makes it applicable in real-time 
applications. As future research, we are planning to use our people counting method in real-time applications. 
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